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A High Performance 256K (512K) Static ROM 

EDISON FONG, JIASHANN CHANG, and WEN PERNG TAI 

Abstract —A 256KJROM, fully expandable to 51 2K, has been fabricated. 
The ROM utilizes a 2.5 /xm NMOS multiple threshold technology. It has a 
typical access time of 120 ns and uses push-pull circuitry to achieve an 
active current of 60 mA and a standby current of 1.0 mA. Total chip size is 
39K mil 2 for the 256K version. A modified X cell has been chosen which 
requires 7.5 /tm X7.5 /xm. Current sensing was chosen lo optimize access 
time. 



the threshold voltage of selected cells in the memory array. This technique 
is more economical than the traditional diffusion technique which must 
occur cariy in the processing; sequence and thus extend lead time. 

The architecture of the chip is shown in Fig. 1. with the die photo of the 
256K version shown in Fig. 2(a). It is organized in eight 512x64 groups 
for the 256K version and eight 1024X64 groups for the 512K version. 
Through the use of predecoding, there are 128 NOR ^decoders (256 for 
the 512K), one for every four rows of cells. The cell chosen is a modified 
X cell [2] configuration requiring only 7.5 /imX7.5 urn. To optimize 
access time, current sensing was incorporated as described by Wong ei al. 
[3]. 



Introduction 

The performance of MOS ROM's has increased extensively in the past 
five years, from 16K densities ten years ago to a 4 Mbit version for 
character generation reported in 1980 [1J. Access times have ranges from 
45 ns to several hundred nanoseconds, depending on the density and 
application Although EPROMs and EEPROM's have made significant 
advancements in the past few years, ROM's will remain dominant where 
the requirements of high volume, high density, high reliability, and low 
cost arc of primary concern. The ROM described in this paper is a 
general -purpose ROM, and thus compromises were made to optimize 
overall performance. The 256K (32Kx8) ROM is fully expandable to 
512K with minimum changes. It has a 60 mA typical active power with a 
1.0 mA standby. The low standby power is attributed to the extensive use 
of push-pull circuitry and a novel back bias generator. Because of its low 
standby power, it is feasible for battery-operated equipment. The ROM is 
fully static and is programmed via an ion implantation which increases 
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Process Technology 

The ROM is fabricated on a basic 2.5 /im gate length, 500 A NMOS 
polysilicon technology. Although CMOS would require less power, it was 
considered too costly, and the loss in density would not meet the 
requirements mentioned previously. Five thresholds are available, which 
include a hard depletion (-2.5 V), a soft depletion (-0.7), a soft 
enhancement (0.0 V), a hard enhancement (0.7 V), and a high threshold 
( I 7.0) for programming. A polysilicon-to-diffusion contact is also avail- 
able. 

Address Buffer 

The address buffers consist of three inverter stages driving a push-pull 
output stage. The requirements of low power, minimum propagation 
delay, and in specific cases, high K oul are important. Probably the most 
critical stage is the first inverter in that it must be TTL compatible, even 
with variations in supply and process. To ensure TTL compatibility, the 
device dimensions of the first stage were designed conservatively. 

The difference in this buffer is in the output stage. To conserve active 
power, soft enhancement devices were used as the output pull-ups. 
Unfortunately, this limits the maximum output level. If only soft enhance- 
ments were used, the maximum voltage for a mail level would be 2V TSE 
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Fig. 1 . Architecture of the 256K ROM. The SI 2K version uses 1024 rows. 



(two soft enhancement thresholds) below V cc , This is due to the output 
device and the power-down device of the previous stage. To minimize this 
limitation, small depletion devices were placed in parallel with the soft 
enhancement output devices. This increases the active power by a mini- 
mum amount, but allows the output to reach V cc . In the power-down 
mode, current paths to ground are disabled. The output of both nodes of 
the address buffers go to approximately V cc /2 in the power- down mode. 
This configuration allows for an improved chip enable time since the 
outputs of the buffer need only swing V cc /1 when the chip is selected. 

X Decoder 

The design of X decoders is always difficult since compromises must be 
made. The X decoder in the 256K ROM desires the following three 
features. First, the overall power of all the decoders must be kept 
extremely low since there are 128 of them. Second, the circuitry must fit 
in the memory cell pitch, which eliminates complex schemes such as 
bootstrapping. Lines WLl- WLA } as shown in Fig. 3, expand the horizon- 
tal length of the ,Y decoder which consumes significant silicon area. Third, 
the word line should have good voltage swings to provide good signal 
margins which lead to improve access times. 

To alleviate the first problem, predecoding was used to drop the X 
decoder count to 128 rather than 512 in the 256K version. In addition, 
this allowed for a 30 /im pitch for each decoder. Poly- to- diffusion spacing 
was reduced to zero to fit the decoder in the prescribed pitch. This 
technique increased parasitic capacitance, but did not impair the function- 
ality of the chip. 

Lines /I01, >402, /103, and A 04 of Fig. 3 come from two addresses (A5 
and A6), which determines a one of four selection from a predecoder in 
the periphery. This predecoder consists solely of push-pull circuitry with 
bootstrapping to preserve speed, output voltage swing, and power con- 
sumption. 

The nor gate devices of the X decoder come from predecoded lines 
utilizing addresses Al y AS, A9, All, All, Al\ and ,414. These decoders 
are push-pull, but contain no bootstrapping circuitry since the outputs 
feed into gates of inverters. Each pull-down device in the nor configura- 
tion ( Ml- M4) has four predecoded metal lines associated with it, giving 
a total of 16 metal lines. 

The majority of the active current is consumed in the X decoder. This is 
due to the nor configuration. Push-pull circuitry could not be used due 
to the 30 pm pitch limitatioa The maximum output of the word line is 
one Vjxe (one soft enhancement threshold) below V t< . It is thus crucial 
for lines /1 01-/1 04 to reach V cc to preserve speed. 

Y Decoder 

The Y decoder uses floating grounds to achieve a select. A schematic of 
one section of the decoder is shown in Fig. 4. For each bit, 16 sections of 
what is shown in Fig. 4 are used. When a memory cell (Ml) is selected, 
one side of the cell is grounded through Ml y while the other side 




Fig. 2. (a) Die photograph of the ROM (256K). (b) Typical 



photograph. 



completes a path to the sense amplifier. The advantage oi (his technique i$ 
that the cell is kept very small, as shown in Fig. 5. The disadvantage of 
this technique is that the decoding is more complex. Y predecoders were 
designed using push-pull techniques, but bootstrapping was not used due 
to its increased silicon area. 

Cell Structure 

Fig. 5 shows the modified X cell configuration. This cell was chosen 
because of its compact size as compared to a conventional Tcell approach 
which would have required 8 pmX8.5 pm with the same process design 
rules. The .V cell minimizes area because the poly- to- contact spacing 
becomes the linuting design rule. The conventional T cell is contact- 
space-limited which requires, typically, 20 percent more area. The X cell 
has been modified slightly by clipping the edges of the metal-to-diffusion 
contact, which results in a 20 percent savings in area as compared to the 
conventional X cell previously reported \2\. The effective transistor size is 
approximately 3.5/2.5 /i m. 

In designing wiih the X cell, oihej problems arise, primarily parasitic 
currents which decrease signal margins. Fig. 4 illustrates such an example. 
Suppose that Afl of Fig. 4 is selected. This requires the first bit line to go 
low. Simultaneously, M2 and M3 must be "on," which is determined by 
the Y predecoder. If Ml has a transistor, then bit line 2 is low. However, 
sharing this node is Af4_ If, again, there is a transistor present at A/4, it 
will pull up hit line 2. This parasitic path decreases signal margins on an 
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Fig 3 Schematic diagram of the X decoder. 
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already small cell (3.5/2.5). One technique to minimize this problem is to 
have the bit line pull-ups at the sense amplifier. In the present design, A/6 
will p recharge bit line 2, but the sense amp will pull bit line 2 higher than 
that available from A/6. Thus, minimum current will flow through A/4. 
Voltage VRA is designed so that the voltage which appears at the bit line 
is slightly lower than that of the sense amplifier. When the sense amplifier 
is activated, most of the current flows through the sense amplifier. This 
technique improves access time because the bit line is already charged 
prior to a select. However, this scheme reduces the signal current into the 
sense amp since the current path to ground for M6 is the memory cell. To 
minimize the current in A/6 when a cell is selected, it is crucial for VRA to 
be set low so thai the bit line is below the inherent voltage of the sense 
amp. In the design presented in this paper, the bit line voltage was set to 
approximately 2 0 V. In addition, VRA tracks must track the sense 
amplifier pull-up to tolerate various processing and operating conditions. 

Sense Amplifier and Reference Generator 

Current sensing was chosen over voltage sensing for three reasons. 
First, the input of the amplifier is a low impedance node, thus, bit-line 
charging is improved. The bit-line charging task is then shared by both the 
pull-up device and the sense amplifier. Second, the input node to a 
current sense amplifier typically sits ai a lower voliage than a voltage 
sensing scheme using a differential pair; thus, the bit line need not charge 
as high Third, the biMine voltage need not change in current sensing; 
thus, the bit-line capacitance is not as dominant in determining the access 
time. 

There are two disadvantages to current sensing in the ROM. First, if 
the resistance path from the cell to the sense amp is large, the cell will not 
experience the low-impedance characteristics of the sense amp, and thus 
the advantages of current sensing diminish. Second, the resistance data 
path from the cell to the sense amp must be balanced with the reference 
circuitry since this will determine the sensing current Any misbalance, 
even in the dc condition, will degrade performance. Current flows through 
the pass transistors in the dc state in current sensing as opposed to voltage 
sensing where current drain to the sense amp is ideally zero in the steady 
state. Therefore, special care must be taken in the layout to assure 
symmetry with the reference and memory cell path. 

The scheme chosen was a configuration similar to that reported by 
Wong et ai [3]. It is presently used on several commercially available 
products [4] Independent references were used for each output to im- 
prove signal margins during transient. This technique minimizes ac and 
dc interaction between the different outputs, resulting in improved speed 
performance, even though it consumes more silicon area. 

Back Bias Generator 

To meet the sLringent standby power requirements of the chip, the back 
bias generator is designed to the following three specifications. First, it 
has a current drain of no more than 1 mA. Second, it performs throughout 
the temperature range with good stability. Third, it is supply independent 
to the first order 

A block diagram of the generator is shown in Fig. 6. It consists of a nng 
oscillator, buffer, capacitive pump, and a regulator. The ring oscillator is a 
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Fig 6. Schematic, or the bark bus generator 



five-stage inverter ring. The pull-ups in the oscillator were chosen as soft 
depletion transistors. This configuration draws much less operating cur- 
rent than depletion loads for a given silicon area. In addition, the pull-up 
devices stay in the active region longer, resembling that of an ideal load. 

Soft enhancement devices used in the buffer stage are driven in a 
push-pull configuration ro conserve power Charging the two pump 
capacitors requires large pull-up devices which draw unacceptable power 
if loads were depletion devices. The tradeoff of using soft enhancement 
devices is that the output of the buffer will not swing to the full power 
supply range. To circumvent this problem, bootstrapping was used (CI 
and C2) with small depletion pull-ups as the drivers to boost the gates of 
Ml and A/2 to about + 8 0 V. 

The actual pumping is performed with C3 and C4 in conjunction with 
A/3. A/4, and MS in a diode-connected configuration for clamping. A 
two-stage pump was chosen to pump to a maximum of -7.0 V. To 
stabilize the substrate voltage, an inverter (M6 and A/7), with the gate of 
the driver connected to the substrate, is used to disable the ring oscillator 
when V bb is less than V TD . This configuration is stable throughout supply 
voltage variations, temperature, and load conditions up to 40 jiA. 

Experimental Results 

The 256K ROM has a typical access time of 120 ns as shown in Fig. 
2(b) and a typical chip select time of 120 ns. Active curreni is typically 60 
mA with a standby current of 1.0 mA. The die size is 39K mil 2 . 
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